[cDAC] Implement DacDbi GetNativeCodeInfo / GetNativeCodeInfoForAddr#128338
[cDAC] Implement DacDbi GetNativeCodeInfo / GetNativeCodeInfoForAddr#128338rcj1 wants to merge 13 commits into
Conversation
Co-authored-by: rcj1 <77995559+rcj1@users.noreply.github.com>
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR adds cDAC support for DacDbi native-code lookup APIs and introduces a lightweight EnC data contract/list so debugger-side code can retrieve native code region and EnC version information without walking debugger method-info tables.
Changes:
- Implements managed cDAC
GetNativeCodeInfo/GetNativeCodeInfoForAddrand related interop struct shape. - Adds EnC data descriptors, contract registration, native
ModuleEnC data storage, and DAC lookup changes. - Extends loader/runtime-type-system contracts for member refs and async variant resolution, with design docs.
Show a summary per file
| File | Description |
|---|---|
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/Dbi/IDacDbiInterface.cs |
Updates DacDbi signatures and mirrors native code-info data. |
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/Dbi/DacDbiImpl.cs |
Implements native-code-info lookup in cDAC. |
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/EcmaMetadataUtils.cs |
Adds mdtMemberRef token type. |
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/DataType.cs |
Adds EnCData descriptor type. |
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/Module.cs |
Reads optional EnC data list from modules. |
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/EnCData.cs |
Adds managed view over native EnC data nodes. |
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/CoreCLRContracts.cs |
Registers the EnC contract implementation. |
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/RuntimeTypeSystem_1.cs |
Adds async variant resolution support. |
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/Loader_1.cs |
Adds member-ref-to-method lookup helper. |
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/EnC_1.cs |
Implements EnC version lookup contract. |
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Constants.cs |
Adds default EnC version global name. |
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IRuntimeTypeSystem.cs |
Exposes async variant contract API. |
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/ILoader.cs |
Exposes member-ref lookup contract API. |
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IEnC.cs |
Adds public EnC contract abstraction. |
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/ContractRegistry.cs |
Adds EnC contract accessor. |
src/coreclr/vm/datadescriptor/datadescriptor.inc |
Publishes EnC descriptors, global, and contract. |
src/coreclr/vm/ceeload.h |
Adds native EnC data list and lookup helpers to Module. |
src/coreclr/debug/inc/dacdbistructures.h |
Widens code-info EnC version field. |
src/coreclr/debug/ee/functioninfo.cpp |
Records EnC data when JIT info is initialized. |
src/coreclr/debug/di/module.cpp |
Casts widened EnC version for DI objects. |
src/coreclr/debug/di/divalue.cpp |
Casts widened EnC version for function lookup. |
src/coreclr/debug/daccess/dacdbiimpl.h |
Updates EnC lookup helper signature. |
src/coreclr/debug/daccess/dacdbiimpl.cpp |
Switches native DAC EnC lookup to new module list. |
docs/design/datacontracts/RuntimeTypeSystem.md |
Documents async variant contract behavior. |
docs/design/datacontracts/Loader.md |
Documents member-ref lookup helper. |
docs/design/datacontracts/EnC.md |
Adds EnC contract design documentation. |
Copilot's findings
- Files reviewed: 26/26 changed files
- Comments generated: 11
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Co-authored-by: rcj1 <77995559+rcj1@users.noreply.github.com>
| this->m_encVersion = this->m_methodInfo->GetCurrentEnCVersion(); | ||
| #ifdef FEATURE_METADATA_UPDATER | ||
| if (this->m_encVersion != CorDB_DEFAULT_ENC_FUNCTION_VERSION) | ||
| { | ||
| Module* pModule = this->m_pLoaderModule; | ||
| EnCData* pEnCData = (EnCData*)(void*)pModule->GetLoaderAllocator()->GetLowFrequencyHeap()->AllocMem(S_SIZE_T(sizeof(EnCData))); | ||
| pEnCData->addrOfCode = (TADDR)this->m_addrOfCode; | ||
| pEnCData->token = this->m_methodInfo->m_token; | ||
| pEnCData->encVersion = this->m_encVersion; | ||
| pModule->AddEncData(pEnCData); |
There was a problem hiding this comment.
good catch. Only possibly relevant in
runtime/src/coreclr/debug/di/divalue.cpp
Line 2748 in 3206a8e
| uint coldSize = 0; | ||
| if (cbh is not null) | ||
| { | ||
| em.GetMethodRegionInfo(cbh.Value, out hotSize, out coldStart, out coldSize); |
829f44a to
7f0ded6
Compare
|
Caution Security scanning requires review for Code Review DetailsThe threat detection results could not be parsed. The workflow output should be reviewed before merging. Review the workflow run logs for details. Note This review was generated by Copilot. 🤖 Copilot Code Review — PR #128338Holistic AssessmentMotivation: The PR is well-motivated. It restructures debugger-side EnC version lookup to avoid walking all Approach: The approach is sound — a simple linked list on Summary: Detailed Findings
|
Co-authored-by: rcj1 <77995559+rcj1@users.noreply.github.com>
Co-authored-by: rcj1 <77995559+rcj1@users.noreply.github.com>
Co-authored-by: rcj1 <77995559+rcj1@users.noreply.github.com>
Co-authored-by: rcj1 <77995559+rcj1@users.noreply.github.com>
Co-authored-by: rcj1 <77995559+rcj1@users.noreply.github.com>
Co-authored-by: rcj1 <77995559+rcj1@users.noreply.github.com>
| void AddEncData(EnCData* pData) | ||
| { | ||
| LIMITED_METHOD_CONTRACT; | ||
| pData->pNext = m_pEnCDataList; |
| return nullptr; | ||
| } | ||
|
|
||
| PTR_EnCData FindLatestEncData(mdMethodDef token) |
There was a problem hiding this comment.
How does this work with generic methods? Generic methods can have multiple instantiations.
There was a problem hiding this comment.
Hm, the thing is this doesn’t look like it ever worked to distinguish generic instantiations when getting the latest EnC version. I wonder if we even need/use this?
There was a problem hiding this comment.
Currently I see FindLatestEncData getting used in CordbObjectValue::GetFunctionHelper but it doesn't have to be. Instead of calling CordbModule::LookupOrCreateFunction(token,encVersion) where encVersion ultimately gets calculated in this call you can instead call CordbModule::LookupOrCreateFunctionLatestVersion(token). That LatestVersion variant of the API already has another independent path to discover the latest (potentially unjitted) ENC version based on Debugger::UpdateFunction sending the new ENC version in updates.
Aside from CordbObjectValue::GetFunctionHelper I only see one other callsite for GetNativeCodeInfo and that one doesn't look at the ENC version information that is returned. You could modify GetNativeCodeInfo to only return exactly what that caller needs (MethodDesc+CodeStartAddr) which will eliminate the only callsite that cared about the pLatestEnCVersion param on LookupEnCVersions(). Then you can remove that parameter and this method.
There was a problem hiding this comment.
That sounds good. @jkotas it looks like generics in EnC are handled here, in that edits are made per-method and later applied per-instantiation:
runtime/src/coreclr/vm/encee.cpp
Line 376 in 5996b12
| SUPPORTS_DAC; | ||
| for (PTR_EnCData pCur = m_pEnCDataList; pCur != nullptr; pCur = pCur->pNext) | ||
| { | ||
| if (pCur->token == token && pCur->addrOfCode == addrOfCode) |
There was a problem hiding this comment.
token match should be unnecessary. The code should be enough to identify the method.
| return nullptr; | ||
| } | ||
|
|
||
| PTR_EnCData FindLatestEncData(mdMethodDef token) |
There was a problem hiding this comment.
This will find the most recently JITed method body. It may not be the one with the highest EnC version. My guess is that you expect this is to find the highest EnC version. Is that right?
There was a problem hiding this comment.
Yes, copilot pointed this discrepancy out here #128338 (comment). I think I’ll have to bring the mutation of the linked list up next to SetCurrentEnCVersion
There was a problem hiding this comment.
We can refactor to remove the dependency on this method entirely (https://github.com/dotnet/runtime/pull/128338/changes#r3265486372)
| return nullptr; | ||
| } | ||
|
|
||
| PTR_EnCData FindLatestEncData(mdMethodDef token) |
There was a problem hiding this comment.
Currently I see FindLatestEncData getting used in CordbObjectValue::GetFunctionHelper but it doesn't have to be. Instead of calling CordbModule::LookupOrCreateFunction(token,encVersion) where encVersion ultimately gets calculated in this call you can instead call CordbModule::LookupOrCreateFunctionLatestVersion(token). That LatestVersion variant of the API already has another independent path to discover the latest (potentially unjitted) ENC version based on Debugger::UpdateFunction sending the new ENC version in updates.
Aside from CordbObjectValue::GetFunctionHelper I only see one other callsite for GetNativeCodeInfo and that one doesn't look at the ENC version information that is returned. You could modify GetNativeCodeInfo to only return exactly what that caller needs (MethodDesc+CodeStartAddr) which will eliminate the only callsite that cared about the pLatestEnCVersion param on LookupEnCVersions(). Then you can remove that parameter and this method.
| return nullptr; | ||
| } | ||
|
|
||
| PTR_EnCData FindLatestEncData(mdMethodDef token) |
There was a problem hiding this comment.
We can refactor to remove the dependency on this method entirely (https://github.com/dotnet/runtime/pull/128338/changes#r3265486372)
| { | ||
| LIMITED_METHOD_CONTRACT; | ||
| SUPPORTS_DAC; | ||
| for (PTR_EnCData pCur = m_pEnCDataList; pCur != nullptr; pCur = pCur->pNext) |
There was a problem hiding this comment.
I'm worried about the performance overhead of an O(N) linked list search. Although I suspect many uses of ENC are small I'm not confident that all of them are and this method can get called a lot.
Rather than a new linked list, how about we add the ENC version to the method DebugInfo? We could add an ENCVersion as the 7th item in the FAT header. This would have no cost for most methods which use the slim header or 1 nibble per method for those using the FAT header.
There was a problem hiding this comment.
If we don't have to keep track of unjitted methods, we could stuff the EnC version into the code header.
There was a problem hiding this comment.
I do not think it belongs to code header directly.
I think having in the side-table where it is today or in the debug info as suggested by Noah are reasonable options.
We do have yet another code version scheme for tiered compilation and Rejit. Rejit is very much like EnC. Would it make sense to store the EnC version in the existing structure used by tiered compilation and Rejit? (ILCodeVersionNode, NativeCodeVersionNode, etc.)
There was a problem hiding this comment.
Would it make sense to store the EnC version in the existing structure used by tiered compilation and Rejit?
Architecturally yes, it makes sense to have it there. I didn't initially propose that because I worried it might be a more elaborate change, but its where I envisioned ENC would be some day. If it winds up being not too hard to add it there right now thats very nice. I am guessing some of the work along that route would be:
- ENC currently uses independent logic to determine the active version and switch between versions
- ICorDebug is aware of ILCodeVersions but currently they have different treatment than ENC versions. Right now ILCodeVersions generate multiple ICorDebugILCode objects parented under the same ICorDebugFunction whereas ENC expects multiple unique ICorDebugFunctions. Assuming new ENCVersions get represented at runtime as new ILCodeVersionNodes we'd need to teach ICorDebug to distinguish them.
There was a problem hiding this comment.
If we are just concerned about performance, a cheap alternative would be to add a lookup map keyed by method token on the module, and have it store a pointer to per-token linked lists of (codeAddr, version) - the same structure used in ILCodeVersions. I am not sure how much work it would be to completely unify EnC and ReJIT, or if maybe there is a middle ground where we add a node but say "actually this is an EnC version".
There was a problem hiding this comment.
if maybe there is a middle ground where we add a node but say "actually this is an EnC version".
I think it would be a significant improvement compared to what we have today.
There was a problem hiding this comment.
if maybe there is a middle ground where we add a node but say "actually this is an EnC version".
I think it would be a significant improvement compared to what we have today.
I can see us introducing ILCodeVersions for each EnC edit, each of which would get a singular NativeCodeVersion. We would not fold the actual logic of EnC into the logic for CodeVersions, which means that the existing iterators over CodeVersions would have to be updated to skip EnC nodes. We could even fold the APIs I just added on the EnC contract into the CodeVersions contract; it would be iterating the same data with a different filter.
Do you think this makes sense or would we have to go further in integrating the two for it to make sense?
There was a problem hiding this comment.
I don't believe finding EnC methods, fields, etc would be easily folded into CodeVersions. So we would still need an EnC contract until this goes away.
There was a problem hiding this comment.
Do you think this makes sense or would we have to go further in integrating the two for it to make sense?
It makes sense to me. I understand that ReJIT and EnC are exposed differently in the public APIs, so it is hard to make them handled uniformly.
finding EnC methods, fields, etc would be easily folded into CodeVersions. So we would still need an EnC contract until this goes away.
It is fine for these to be separate.
There was a problem hiding this comment.
We would not fold the actual logic of EnC into the logic for CodeVersions, which means that the existing iterators over CodeVersions would have to be updated to skip EnC nodes.
I'm not sure what boundary this implies exactly. Here is my initial thought which may or may not match what you had in mind?
- CodeVersionManager::IsMethodSupported(), MethodDesc::IsVersionable() and IsVersionableWithPrecode() would now return TRUE for editable methods in ENC enabled modules.
- MethodDesc::DetermineAndSetIsEligibleForTieredCompilation() would continue to return FALSE for methods in ENC enabled modules
- ReJIT remains incompatible with ENC enabled modules. ReJitManager::UpdateActiveILVersions would continue to return a failure.
- Each ENC method version would have a corresponding ILCodeVersion. The ILCodeVersion would have a single associated NativeCodeVersion.
- CodeVersionManager::GetILCodeVersions and CodeVersionManager::GetNativeCodeVersions would enumerate the ENC versions. Upstream consumers such as ReJitManager::GetReJITIDs() would filter the enumeration to avoid exposing ENC versions as if they had been created by ReJIT.
- MethodDesc::ResetCodeEntryPointForEnC() would go away. Switching to a new ENC version would use CodeVersionManager::SetActiveILCodeVersions(). Not sure if CodeVersionManager is the place where we'd enforce it, but I'd expect each module-level ENC edit always sets the most recent IL version of the edited methods as active and never reverts back.
- CodeVersionManager::GetActiveILCodeVersion would return the active ENC version of the method, whatever it is.
- We would stop using the Module::SetDynamicIL() mechanism to store ENC IL pointers and store the IL code pointer in the ILCodeVersion instead.
If you investigate and find this would run into issues please do raise it - this was just my initial guess at what might work well.
DebuggerMethodInfos and then theDebuggerJitInfos to get this data. It was noted that this is quite heavy just to get this one piece of data. We don't have that many EnC versions, so we now have a linked list of these stored on theModule.GetNativeCodeInfoandGetNativeCodeInfoForAddr.note to ccr: do not talk about changing com interface compatibility or compatibility with older datadescriptors.